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Abstract 

In single-molecule microscopy it is necessary to locate with high precision point sources 
from noisy observations of the spectrum of the signal at frequencies capped by fc, which is just 
about the frequency of natural light. This paper rigorously establishes that this super-resolution 
problem can be solved via linear programming in a stable manner. We prove that the quality of 
the reconstruction crucially depends on the Rayleigh regularity of the support of the signal; that 
is, on the maximum number of sources that can occur within a square of side length about l//c- 
The theoretical performance guarantee is complemented with a converse result showing that our 
simple convex program convex is nearly optimal. Finally, numerical experiments illustrate our 
methods. 


1 Introduction 

The problem of super-resolution arises in many areas of science and engineering including mass- 
spectrometry, radar imaging, and wireless communication. In optics, for example, the natural 
resolution of microscopes is inversely proportional to the wavelength of light used for observation. 
This happens because of the diffraction of light, and makes it fundamentally difficult to study sub¬ 
wavelength features of the object; e.g. to resolve nearby sources located at distances smaller than 
the diffraction limit. This paper is about this problem: namely, the super-resolution of positive 
sources, e.g. fluorescing molecules as in single-molecule imaging. 

Formally, consider a high-frequency signal 

= X - Wj) (1) 

i 

consisting of positive point sources located at unknown positions w* and of unknown intensity 
Xj > 0. The signal is observed through a convolution of the form 

s{v) = j fio^{v - w)x{w)dw + z{v), (2) 

where /low(') is a low-frequency kernel that erases the high-frequency components of the signal and 
z{-) is noise. The goal of super-resolution is to accurately estimate x(-), i.e., the source locations 
and intensities. 
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1.1 Super-resolution microscopy 

Since our mathematical models and theoretical results are motivated by very concrete contempo¬ 
rary problems in single-molecule imaging, we find it best to pause and introduce some background 
material; for details beyond those we provide below, please check the wonderful book by J. Good¬ 
man [1], 


Object 


Optical System 

I- 

_l_ 



Entrance Pupil Exit Pupil 

$(w, t) 


Detector 




Eigure 1: Model of an optical system. 


To understand where (2) comes from, we derive the input-output relation of a simple imaging 
system as shown in Eigure 1. While the laws of optics are governed by Maxwell’s equations, which 
are linear, the vectorial nature of the electric and magnetic fields can be neglected in Eourier optics 
and the physics fully described via the time-varying phasor [1, Sec. 3.2], a term assigned to any of the 
three components of these two fields. Assume that a narrow-band (not necessarily monochromatic) 
light is used for illumination, and let $(w, t) and 'I'(v, t) respectively denote the input/output 
phasors describing the field emitted by the object being imaged and the field generated at the 
receiver of the system. Here, w, v G are indexing spatial coordinates in the object plane and 

in the detector plane, respectively, and t G M is indexing time. We assume, for convenience, that 
the phasors $(w,t), 'I'(v,t) have been frequency-shifted (as a function of t) to be centered around 
the mean frequency of the optical wave [1, p. 132], so that, for example, E{w,t) = 3^[<l>(w, 
where E is one of the components of the electric field and 9 is the average frequency of emitted light. 
The diffraction of light in the optical system can be described by the Eraunhofer approximation 
leading to [1, Eq (6-6)] 

'I'(v, t) = J h{v — 'w)^{'w,t)dw, (3) 

where h{v) is the point-spread function (PSE) of the optical system. In general, the Eourier 
transform of /i(-) is proportional to the indicator function of the aperture and because the aperture 
is finite, h{-) is band-limited. To be concrete, assume that the entrance and the exit pupils in 
Eigure 1 are square. In this case [1, Sec. 6.2.2] 


/i(v) oc 


1 sin(27r/efi) 1 sin(27r/cn2) 


V = [VI,V2]^. 


( 4 ) 


The spatial frequency cut-off of the optical system is given by 


J _ sin(6') 

Jc - ^ > 

where A is the wavelength of emitted light (average wavelength in the narrow-band illumination 
case) and Q is half of the angle spanned by the exit pupil as seen from the center of the image plane 
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(see Figure 1). Note that due to the narrow-band illumination assumption, /i(-) depends upon the 
average wavelength of the optical wave, but not upon the specific frequencies in the illuminating 
spectrum so that the system model is described by the simple convolution equation (3). 

In optics, the carrier frequency v ~ 500 THz is much higher than the frequency /het ~ 10 GHz, 
which electronic components can respond to, e. g. the frequency of heterodyne used to down-convert 
the signal. Consequently, in optics only the time-average of the instantaneous intensity of received 
light (called received intensity) is directly observable [1, Eq (6-8)]: 

s(v) = (^'(v,t)^'*(v,t)), (5) 

where '!'*(•) denotes the complex conjugate of '!'(•) and (•) stands for time averaging: 

rl//HET 

(9(i)) = /het / 

Jo 

In a majority of microscopy applications, the object emits incoherent light. Mathematically, this 
situation is described by assuming that frequencies of spatially separated emitters vary in statis¬ 
tically independent fashions. This idealized property may be represented by the equation [1, Eq 
(6-14)] 

(<I>(w, t)$*(w/1)) = (I(-w — w')x(w). (6) 

The quantity x(w) is the time-average of the instantaneous intensity of light emitted by the object 
and is called emitted intensity. Substituting (3) into (5) and then using (6) we obtain the following 
input-output relation 

s(v) =//iow(v - w)x(w)dw, /iow(v-w) = l/i(v-w)l^. (7) 

Observe that (7) is a linear convolution equation with respect to emitted intensity; compare to (3), 
which is a linear convolution equation with respect to the components of the emitted held. The 
low-frequency kernel /low(') is the square of the two-dimensional (2D) sine kernel (4) and has a 
spatial frequency cut-off at fc = 2/c (twice that of the kernel h{-)). The emitted intensity x(-) is a 
nonnegative function, a property that is crucially important for all results in this paper. Finally, 
the ii norm of the signal, 

lk(-)lli = j \x{w)\dw, 

has the meaning of cumulative emitted intensity or total energy of light emitted per second. As 
a side remark, note that when the sample is illuminated by coherent light, as in X-ray crystal¬ 
lography, the resulting input-output relations is no longer linear, in stark contrast to (7), and 
the phase retrieval problem needs to be solved. For the interested reader, this point is explained 
in Appendix B. 

Our goal is to reconstruct the signal x(-) from the observations s(-) in (7). Without additional 
structural assumptions on x(-), this is clearly not possible, because the high-frequency components 
of x(-) are lost. The details of x(-) that are smaller than the Rayleigh diffraction limit, ^ 1.22//c, 
cannot be distinguished [1, Sec 6.5.2]. In single-molecule microscopy [2-4], a modern imaging 
technique, the signal x(-) consists of several disjoint molecules emitting light. Here, the size of 
each molecule is about 4nm, which is much smaller than l//c ~ 200 nm, and yet it is absolutely 

^The specific value of the constant, 1.22, is largely a historical convention; the point here is that the details of the 
image that are much smaller than 1/fc are blurred. 
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necessary to estimate the locations of these molecules with precision that is significantly higher 
than the Rayleigh diffraction limit. 

The main contribution of this paper is to show that under the structural model (1), it is possible 
to estimate x{w) via linear programming stably from noisy data—all imaging systems are funda¬ 
mentally noisy—with resolution beyond the diffraction limit. Further, the quality of estimation 
fundamentally depends on how regularly (in the sense explained in Section 2) the sources/molecules 
are distributed in the image domain. 

1.2 Mathematical models and methods 

The super-resolution theory developed in this paper is discrete, which means that the input signal 
x{-) is assumed to be supported on a fine grid. The nonzero elements of this discrete signal are 
suggestively called “spikes”. In optics, there is no grid, of course; the spikes in (1) can be in arbitrary 
(continuous) locations, and the companion paper [5] shows how to generalize our key result to the 
continuous setting. In truth, the analysis of the continuous-space problem is far more technical 
than that presented here; however, the final result—the stability estimate in (18) —is essentially 
the same. For now, the advantage of working with a discrete model is that we can explain the key 
concepts without bothering with heavy mathematical machinery. 

1.2.1 Discrete setup 

A noiseless discrete model is of the form 

s = Qx, (8) 

where x is either a one- or two-dimensional discrete array of intensities, Q models the (discrete) 
convolution equation and s is the output data, assumed to be of the same dimension(s) as the input 
vector X. We have already seen examples of PSFs or convolutions; for instance, /low(') in (7) is 
the square of the sine kernel (in each direction), the sine kernel being an ideal low-pass filter whose 
frequency response is a box function. Therefore, the frequency response of /low(') is a triangle 
function in ID and a pyramid in 2D. In (10), (12), (16) and (17) below, we consider natural PSFs 
in one and two dimensions so that in the remainder, Q in (8) or (9) may be given by any of these. 

1.2.2 Noise 

In modern microscopy applications, the intensities of emitted/received light are very low and in 
such regimes, the main source of noise is due to quantum-mechanical effects. We have argued 
that a component of s represents the expected number of photons to be recorded per unit time 
at a given pixel on the detector. The actual number of photons detected may be modeled as 
a Poisson-distributed random variable so that s ~ Pois (s) , meaning that we have independent 
Poisson variables with means given by (8). In this paper, we shall work with a slightly more 
general signal-dependent additive noise z = s — Qx so that the input-output (10) relation becomes 

s = Qx -I- z. (9) 


1.2.3 Recovery 

Our recovery method from the observations s in (9) is extremely simple; solve 

minx ||s“Qx||i s.t. x > 0. (CVX) 
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In other words, we are looking for a superposition of positive sources such that the mismatch in 
received intensities is minimum. Note that this method does not make any assumption about the 
signal and does not make use of any knowledge other than the received data s and the PSF Q. 
Furthermore, (CVX) is a simple convex optimization program, which can be recast as a linear 
program since both s and Q are real valued. 


1.2.4 Examples of PSFs 

We now discuss various models for the discrete convolution equation (9). 


ID model with flat spectrum. In our first example, x = [xq • • • is a one¬ 

dimensional array, and Q is an ideal low-pass filter in the sense that it has a flat spectrum with a 
sharp cut-off at fc- Formally, 


Q = Qflat,iD = F'^Qflat.ioF, 


( 10 ) 


where 


\k,l 


1 

y/N 


—i2'Kkl/N 


-N/2 + l<k< N/2, 0<1<N-1, 


is the N X N discrete Fourier transform (DFT) and Qfiat,iD = diag([p_ 7 v/ 2 +i'' 'PN/ 2 V) with 


Pk 


1, k = - fc, 

0, otherwise. 


( 11 ) 


The wavelength Ac — l//c gives the width of the convolution kernel represented by Q. We assume 
throughout the paper that N is even for simplicity. 


ID model with triangular spectrum. The discrete one-dimensional analog of our imaging 
system with incoherent light (7) is given by (8), where Q is as follows: 

Q = Qtri,lD = F'^Qtri.loF, (12) 


Qtri,iD = diag(q) with q = [q_j^i 2 +i ''' qN/2V and 


qk 


fl-^ 

J ^ /c-Hl 

lo. 


k = - fc, ■■■,fc 

otherwise. 


(13) 


In this model, the nonzero elements of x represent the molecules at the corresponding locations (on 
the grid) whereas the components of s represent the intensity of light measured at the corresponding 
pixel on the detector. 


2D model with flat spectrum. Similarly, the 2D model with a flat spectrum reads 

S2D = F 2 DQ 2 DF 2 DX 2 D, (14) 

where F 2 D ^ x —)> C'^ x C'^ is the linear operator that implements the 2D Fourier transform 

and acts according to 

N-lN-l 

|F2DX2Dlfe,fe = ^ E E 

q=o h=o 
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and Q 2 D : X C'^ —>■ x is the diagonal operator in the Fourier domain, 

[Q2Dy2D]fci,fc2 = Pki-N/2Pk2-N/2 [y2D]fci,A:2- (15) 

To keep the same notation, define x = vec(x2D) and s = vec(s2D); where the vec(-) operation stacks 
the columns of a matrix into a tall vector. Using the properties of the Kronecker product, (14) can 
be written as (8) with 

Q = Qflat,2D = (F'^ <8) F'^)(Qflat,lD ® Qflat,lD)(F (8)F). (16) 

2D model with triangular spectrum. With the vectorized notation, the 2D model with tri¬ 
angular spectrum can be written as (8) with 

Q = Qtri,2D = (F*^ <8) F'^)(Qtri,lD ® Qtri,lD)(F ® F). (17) 


1.2.5 Intensity normalization 

It follows from our earlier discussion that for incoherent light (models with triangular spectra), 
we may interpret ||x||i as the total intensity of light emitted by the object. Similarly, ||s||i is the 
total intensity of light observed at the receiver. Letting [qo • • • qw-i] denote the columns of Q, (13) 
guarantees that ||q/||i = 1 for all 1. To see this, first note that q; is a shifted version of ;^F'^q so 

that ||q/||i = II ;^F'^q||i. Next, write q = d*d where d = [d_i^/ 2 +i ‘'' dN/ 2 ]^ 

4 = /'\/^5'’ ^ =-/c/2, • • • ,/c/2, 

[ 0 , otherwise. 


and * denotes the discrete convolution. Finally, use the convolution theorem to conclude 


Iqdli = 


1 


Vn 


F^^q 


1 


Vn 


F^(d*d) 


FHdOF^d 


= ||F^^d||2 = l, 


where 0 denotes the element-wise product and a takes conjugate element-wise. Therefore, using 
that X; > 0 and q/ > 0 for all I, 


N-l 


N-1 


N-l 


Si = 


Xiqi ^ ^ a^dlqdli = '^^1 = ll^lli- 


1=0 


1=0 


1=0 


Hence, our normalization is such that the intensity of light (emitted energy per second) is conserved 
in the system. In the models (11) and (15) with a flat spectrum the ii norm of the signal is not 
conserved. 


1.3 Notation 

Sets are denoted by calligraphic letters A,B, and so on. Boldface letters A, B,... and a, b,... 
denote matrices (or linear operators) and vectors, respectively. The element in the i-th row and 
j-th column of a matrix A is Uij or [A]jj, and the i-th element of the vector a is ai or [a]j. For a 
vector a, diag(a) stands for the diagonal matrix that has the entries of a on its main diagonal. The 
superscripts and stand for transposition and Hermitian transposition, respectively. For a finite 
set X, we write |X| for the cardinality. For two functions /(•) and g{-), the notation /(•) = 0{g{-)) 
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means that limsupi_^oo \f{t)/g{t)\ is bounded. For x G M, [x] = min{m G Z | m > x}. We use 
[I: k] to designate the set of natural numbers {1,1 + 1,, k}. The expectation operator is E [•]. 

For a vector a G C"", ||a||i = |“j| ||a ||2 = denote the ii and £2 norms, 

respectively; ||a|| means either ||a||i or ||a|| 2 . The number of nonzero elements of a vector a is 

||a||o. For a matrix A G the operator norm is defined as ||A||i^op = maxj |aij| and 

vec(A) denotes the n^-dimensional vector obtained by stacking the columns of A. For vectors a and 
b, a 0 b denotes the element-wise product; a*b denotes the discrete convolution; the Kronecker 
product of matrices A and B is denoted as A0B. 

2 Main results 

Consider the ID model for concreteness. From (11), (13) we see that we have access to n = 2/c -|- 1 
low-frequency observations while the total number of degrees-of-freedom in x is N. The ratio 
SRF = N/n is called the super-resolution factor (SRF); this is the ratio between 1/n and 1/N, the 
scale at which we have data and that at which we wish to see details. 

As we will review below, the sparsity condition ||x||o < n/2 is sufficient for recovery of x when 
there is no noise. If there is noise, it turns out that sparsity is not sufficient as our ability to 
estimate x from s in a stable way fundamentally depends on how regular the positions of the spikes 
are, i.e., how many spikes may be clustered close together. 

2.1 Rayleigh regularity 

Suppose we are in D dimensions and think of our discrete signal x G as samples on the D- 
dimensional grid {0, l/N ,..., 1 — 1/N}^ C T^, where is the D-dimensional (periodic) torus— 
the circle in ID. In this paper, we can think of the ambient dimension D as being either one or 
two. We introduce a definition of Rayleigh regularity inspired by [6, Def. 1]. 

Definition 1 (Rayleigh regularity). Fix N,n and set Ac = l//c = 2/{n— 1). We say that the 
set of points T C {0, l/N ,..., 1 — l/N}^ C is Rayleigh regular with parameters {d,r) and 
write T G TZnid, r; N, n) if it may be partitioned as T = 71 U ... U 7)^ where the T/s are disjoint, 
and each obeys a minimum separation constraint: 

1. for all 1 < i < 7 < r, 7) n 7)- = 0; 

2. for all square subsets V C of sidelength d\c/2 and all i. 


7)nT>| < 1. 


When no ambiguity arises, we will shortly write TZoid, r) instead of TZoid, r; N, n). 

With a slight abuse of notation, it is also convenient to define a set of Rayleigh regular signals 
(and nonnegative Rayleigh regular signals) with parameters {d,r): 

TZD{d,r) = {x G : supp(x) G TZD{d,r)}, 

TZ/^{d,r) = {x G : supp(x) G TZD{d,r)}, 

where supp(x) is the support of x (the locations on grid where x does not vanish). 
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Remark. Intuitively, in ID, x G TZi{d,r) simply means that the signal x contains no more than 
r spikes in any d consecutive Nyquist intervals; a Nyquist interval being of length Ac/2, which 
corresponds to the Nyquist-Shannon sampling rate of a signal that is band-limited to [—/c,/c]- 
Figure 2 illustrates these concepts for different parameter values.^ 

We discuss some examples of Rayleigh regular signals and first consider x G This 

signal may contain one spike per Nyquist interval. Each spike is associated with two unknown 
parameters: location and amplitude. Since there are n Nyquist intervals, we may have as many as 
2n unknown parameters in total, which is more than the number n of observations (cf. (13), (11)). 
Hence, recovery of x G 1) is in general not possible even in the noiseless case. If we however 

knew the locations of the spikes but not the amplitudes, we could recover the signal x G 1^1(1,1) 
by solving a system of linear equations. 

Next take x G 7?.i(2,1). Such a signal may only contain one spike per two Nyquist intervals. 
Hence, the total number of unknown parameters is at most equal to the number of observations 
and recovery of x G 7^i(2,l) is barely possible in the noiseless case. For example, as discussed 
in Section 3, x can be recovered by Prony’s method. In general, x G TZi{2r,r) is the absolute limit 
for recovery of complex-valued signals in the noiseless case in the sense that x G TZi{2r — e,r), 
e > 0, is in general not recoverable. 

Strictly speaking, the general dimension-counting considerations above do not hold for positive 
signals x G because the positivity of x supplies extra information. On the one hand, it is 
nevertheless possible to construct adversarial signals x G TZi (2r — e, r) that will not be recoverable 
by any method whatsoever. On the other hand, this paper shows that x G TZi (3.74r, r) can be 
recovered stably in the presence of (small) noise via the linear program (CVX). 


2.2 Stable recovery 

We are now ready to present our main results; although they extend to higher dimensions, they 
are stated in 1 and 2D for simplicity. Throughout, we assume that the data s is given by (9). 

Theorem 1 (Flat spectrum). In ID, take Q = Qflat,iD cLnd x G 77)’'(3.74r,r) with > 128r. 
Then the solution x to (CY'K) obeys 


|x — x||i < C ' 


N 


n — 1 


2r 


C ■ SRF 


2r 


(18) 


where C = C'i(r), only depends on r (if SRF > 3.03/r, it can he taken as in (39)). 

In 2D, take Q = Qflat, 2 D; x G 77^(4.76r, r) with fc > 512r. Then (18) holds with a constant C 
depending on r only, which we do not specify for brevity. 

The result in Theorem 1 is not sensitive to the exact choice of the kernel Q and remains valid 
for just about any other low-frequency kernel. To illustrate this point and to connect our theory 
to super-resolution microscopy we now give the result for the PSF discussed in Section 1.1. 

Theorem 2 (Triangular spectrum). Set 1/2 < a < 1. In ID, take Q = Qtri.iD o-nd assume 
X G IZ(({3.74:r/a,r) with fc > 256r. Then the bound (18) holds with a finite constant C = C'i(r, a), 
namely, 

||x-x||i < C-SRF^^ • ||z||i. 

(If SRF > 3.03/r, then the constant can be taken as in (/I).) 

^Clearly, TZi{d,ri) C 77i(d,r 2 ) for ri < r 2 and 77i(di,r) C Ti\{d 2 ,r) for di> d 2 . 


( 19 ) 




7^l(4,l) 



7^l(8,2) 



7^l(12,3) 



Figure 2: Examples of discrete N dimensional signals from the Rayleigh classes 7^i(4,1), 7^i(8, 2), 
7^1 (12,3) depicted on the grid {0, 1/A^, ..., 1 — 1/A^} C T. The sine wave sin(27r/ct) at the highest 
visible frequency is shown in blue for reference. Here, N = 92 and n = 23, so that SRF = 4 and 
Ac = 1/11. By periodicity, the endpoints are identified. 


In 2D, take Q = Qtri, 2 D; x G 7^^(4.76r/Q;, r) with fc > 1024r. Then except for the numerical 
value of the constant, the same conclusion holds. 

When a —>■ 1, C'i(r, a) —>■ oo, which reflects the fact that, as seen from (13), the spectrum 
of Qtri,iD is very small at the border of the interval [—fc,fc\- Hence, with noise, the spectral 
components of the signal can only be observed away from this border, for example on the interval 
[—0.9/c, 0.9/c], which corresponds to taking a = 0.9 in Theorem 2. 

Implications for single-molecule microscopy. Consider Theorem 2 in 2D and remember that 
||z||i is the cumulative difference in light intensity between noiseless (ideal) and real observations. 
Then the theorem tells us that the cumulative error in light intensity in signal estimates is bounded 
by the amplified version of the cumulative error in light intensity in the data. The noise amplification 
factor (NAF) behaves as SRF^^, where r is the parameter describing the regularity of the signal 
support. If the noise level is sufficiently small and the signal is sufficiently regular (r is small), i.e., 
not too many molecules are clustered close together, and SRF is modest, then the algorithm (CVX) 
is guaranteed to achieve excellent super-resolution results. As we will explain in Section 2.3, no 
algorithm can perform substantially better. 

Contribution. Theorems 1 and 2 are new, and while their proofs are given in Section 4, we would 
like to discuss the main technical contribution of this paper. When r = 1 or, equivalently, when the 
spikes are separated by at least 1.87Ac and not necessarily positive, a result similar to Theorem 1 was 
obtained in [7, Th. 1.5] using a different convex program, see also [8] for a continuous-space version; 
this program, given by (LI) below, requires knowledge of an upper bound on ||z||i. The proof in [7] 
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is based on constructing a (dual) low-frequency trigonometric polynomial that interpolates the 
sign of the spikes. The crucial observation we make in this paper is that the technique developed 
in [7] can be extended to the important setting when the spikes are not separated and positive. 
The proof is based on a simple idea: a Rayleigh-regular set may be partitioned into subsets with 
points in each subset separated by at least 1.87Ac; therefore, each set comes with a (dual) low- 
frequency trigonometric polynomial constructed in [7] ; multiplying such polynomials together gives 
a low-frequency polynomial interpolating the signal. 

In the noiseless setting (z = 0), our results state that the recovery is exact. In ID this is 
well known, see [9,10] and the review in Section 3. In 2D and higher, this is new: as explained 
in Section 4, this result cannot be obtained by a straightforward generalization of the techniques 
in [9,10]. 

2.3 Tightness 

In this section we argue that our results in Theorem 2 are nearly tight 
answers to the following two natural questions: 

(i) Can the assumption C = (3.74r, r) be substituted with C 

without changing the bound (19)? 

(ii) Can the exponent 2r in (19) be made smaller? 

2.3.1 Tightness of the length of the interval 

To answer (i), we have already argued in Section 2.1 that even in the noiseless case it is not possible 
to recover many of the signals x G [d, r) with d < 2r. Hence, d = 3.74r is within a factor 1.87 of 
the optimum. This factor comes from the key result from [7] explained above, which concerns the 
existence of low-frequency polynomials interpolating complex scalars of unit magnitude separated 
by 1.87Ac. Any improvement in this technology would yield a corresponding improvement here, see 
Section 4.1.3 for additional details. 

2.3.2 Tightness of the exponent 

To answer question (ii) above, we need the concept of modulus of continuity (MC). 

Definition 2 (Modnlus of continuity). Let ]]•]] be a norm, Q a linear operator, and C a class 
of signals.^ The MC is defined as 

^ llxi - X 2 II 

Xi®x 2 ^ec ||Q(xi -X 2 )ll' 

We also introduce the simple notion of noise amplification. 

Definition 3 (Noise amplification factor). Let ]]•]] be a norm, B a linear operator, and C a 
signal class. Suppose an algorithm A produces an estimator x(s) from the model s = Bx -|- z 
obeying the uniform stability guarantee 

llx-xj] < NAF[A,C,B] -S 

for all X G C and all z with jjzjj < 5. Then we say that the NAF of A is (at most) NAF[H,C,B]. 
^For example, C may be a class of sparse signals, a class of Rayleigh regular signals, and so on. 



. In ID, we are interested in 
= TZf{d,r) with d < 3.74r 
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The MC is related to the NAF via the following simple facts. 

1. If the NAF of an algorithm A is at most NAF[A, C, Q], then 

NAF[A,C,Q] > MC[C,Q]. 

2. Consider the exhaustive search (ES) algorithm (in general intractable) for super-resolving 
signals in C: 


find X G C s.t. ||s — Qx|| < S (ES) 

with 6 chosen so that ||z|| < S. The NAE of this algorithm satisfies 

NAE[ES,C,Q] < 2MC[C,Q]. 


We now provide a lower bound on the MC showing that if the noise is arbitrary, no algorithm 
can have a NAE smaller than CSRE^^”^. Therefore, the exponent 2r in (18) is nearly optimal. 

Theorem 3. Take Q = Qtri,iD- S'et r and d to be arbitrary numbers and C = Tlf{d,r) so that 
by taking d = oo we would have at most r spikes. Then there exist signals x = [xq • • • xat-i]"'', x = 
[xo • • • xn-iV ^ C s.t. 

||x — x||i = 1 

and when N,n ^ oo, N/n ^ SRE 


||Q(x-i)||i^x(r,SRE)SRE2--i. 


For SRE —oo, 


x(r,SRE)^CL(r) 

where Ciir) depends on r only and is given explicitly in (45). 
in Definition 2, 

MC[C,Q] > CL(r)SRE2’'-i 
when N, n and SRE = N/n are large. 


Consequently, letting 


be ll'lli 
( 20 ) 


The proof, given in Appendix A, relies on an explicit construction of nonnegative signals x and 
X with disjoint supports and such that the spikes in x — x cancel out as much as possible after 
low-pass filtering. 

Comparing Theorems 3 and 2, we see that the exponent of SRE in the right-hand side (RHS) 
of (18) is within one unit of the best possible. It is important to point out that the convex 
optimization algorithm in (CVX) knows nothing at all about the regularity of the signal class C. 
Yet, it is adaptive in the sense that it has nearly optimal stability guarantee whatever the (usually 
unknown) value of r. 

Theorem 3 tells us that the MC increases exponentially with r. Eor example, for a practically 
interesting case where SRE = 8, it is not difficult to estimate from (20) and the numerical value 
of the constant that super-resolution could only be possible if r < 5. Eor r > 5, the modulus of 
continuity is greater than 10^, setting unrealistic constraints on noise levels in practical applications. 
This is even an optimistic estimate and, in reality, it is nearly impossible to separate more than 
three sources packed in a Nyquist interval. 

It is not known whether the exponent in the lower bound (20) is sharp. In the very special case 
where the signal contains exactly one spike, it is not difficult to see that a simple matched-filter 
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will have a bounded ratio NAF/SRF, matching the exponent in the RHS of (20). This can be 
used in the setting where the spikes are guaranteed to be so far apart, that the overlap between 
their images in the output space can be neglected; this only happens when the distance between 
neighboring spikes far exceeds Ac/2 and all the spikes have roughly the same magnitude. In general, 
in the interesting case where the images of neighboring spikes can overlap in the output space, it is 
not clear how one could close the small gap between (20) and (18). In fact, it is possible that the 
exponent in (20) can be made larger. As we shall see, to construct adversarial signals x, x in the 
proof of Theorem 3, we only use signals that contain exactly r spikes each. However, the signals in 

{d, r) can have more than r spikes, of course, which could allow one to construct pairs x, x that 
give a larger bound than that in the RHS of (20). Please also see the recent preprint [11], where 
the question of calculating the exact exponent for signals with a total of r spikes is addressed. 

3 Literature review and innovations 

3.1 Prior art 

Algebraic methods. Prony’s method [12] is an algebraic approach for solving the ID super¬ 
resolution problem from noiseless data when the number of sources is known a priori. The data 
s is used to form a trigonometric polynomial, whose roots coincide with the spike locations. The 
polynomial is then factored, thus revealing those locations, and the amplitudes estimated by solving 
a system of linear equations. In the noiseless case, Prony’s method recovers x perfectly provided 
that ||x||o < n/2. No further Rayleigh regularity assumption on the signal support is needed. 
With noise, however, the performance of Prony’s method degrades sharply. The difficulty comes 
from the fact that the roots of a trigonometric polynomial constructed by an algebraic method are 
completely unstable and can shift dramatically even with small changes in the data. 

Many noise-aware versions of Prony’s method are used frequently in engineering applications, 
for example in radar (see [13, Chapter 6]). The most popular methods are MUSIC and its numerous 
variations [14-19], matrix-pencil [20], and ESPRIT [21,22]. For more details on algebraic methods 
we refer the reader to the excellent book [13, Chapter 4]. However, the stability of noise-aware 
algebraic methods is not theoretically well-understood. Asymptotic results (at high SNR) on the 
stability of MUSIC in the presence of Gaussian noise are derived in [23,24]. More recently, some 
steps towards analyzing MUSIC and matrix-pencil in a non-asymptotic regime have been taken 
in [25] and in [26], respectively. Nevertheless, to the best of our knowledge, no strong theoretical 
stability guarantees like those in Theorems 1 and 2 are available for algebraic methods. Hence, 
the search for super-resolution methods that perform well empirically and have sharp theoretical 
stability guarantees is an important open problem. 

Algebraic methods have been generalized to the multi-dimensional case. Surprisingly, the gen¬ 
eralizations are not straightforward and many methods ( [27], [28], [13, Sec. 4.9.7]) have very 
restrictive sparsity constraints: namely, at most n spikes when we recall, that in 2D the total num¬ 
ber of observations is v?. In [29], the number of spikes can be as large as n^/4 in the noiseless case, 
as one would expect from dimension-counting considerations. 

Fundamental limits. In the pioneering work [6], Donoho studied limits of performance for the 
ID super-resolution problem. His main findings can be summarized as follows. Put H-H = 11-112 in 
the definition of NAF and Q = Qflat.iD- 
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• Let C = 7?.i(4r, r), then the NAF of the exhaustive search algorithm (ES) obeys 

NAF[ES,C,Q] < C(r)SRF2'-+i, (21) 

where C(r) is a positive constant that might depend on r but not on N or n. 

• Take an arbitrary pair {r,d) and set C = TZi{d,r). Then 

MC[C,Q] > C{r)SRF^^-\ 

where C(r) is a positive constant that might depend on r but not on N or n. 

To the best of our knowledge, the analysis in [6] has not been generalized to the multi-dimensional 
case. Unfortunately, The algorithm (ES) is not feasible because C is not convex, and [6] does not 
propose any tractable algorithm that would have NAF bounded above by the RHS of (21). In this 
respect, the key question posed by Donoho is whether a feasible algorithm that achieves stability 
in (21) exists. 

Other works [30-32] study the stability of the super-resolution problem in the presence of noise, 
but likewise do not provide a tractable algorithm to perform recovery. Work in [33-35] analyzes 
the detection and separation of two closely-spaced spikes, but does not generalize to the case when 
there are more than two spikes in the signal. 

Super-resolution under minimnm separation constraint. Progress towards resolving the 
question posed in [6] in the general situation where x G C'^—in this paper we consider the case 
only—has recently been made [7,8]. Put ]]•]] = jj-jji in the definition of the NAF, select the PSF 
with a flat spectrum, Q = Qfiat,iD, and consider C = 77.i(4,1). It was shown that the NAF of the 
.^i-minimization algorithm 

minjjxjji s.t. jjs — Qxjji < 5 (LI) 

X 

with 5 chosen so that jjzjj < 5 is at most 

NAF[L1,C,Q] < C-SRF^, (22) 

where C is a positive numerical constant. The condition x G C = 77.i(4,1) is restrictive because it 
means that the signal x cannot contain spikes that are at a distance less than 2Ac. [For real-valued 
signals x G a minimum separation of 1.87Ac suffices.] This is a limitation for many applications 
including single-molecule microscopy, as it is usually understood that the goal of super-resolution 
is to distinguish spikes that are (significantly) closer than the Rayleigh diffraction limit, i.e. at 
a fraction of Ac apart. Unfortunately, if there are spikes at a distance lower than this value, ii 
minimization does not, in general, return the correct solution even if there is no noise. Results 
in [7,8] also cover the multi-dimensional case under a minimum separation constraint. On a similar 
line of research, see [36] and [37] for related results on the denoising of line spectra and on the 
recovery of sparse signals from a random subset of their low-pass Fourier coefficients. The accuracy 
of support detection under the minimum separation constraint is analyzed in [38,39]. 

Super-resolution of noiseless nonnegative signals. The case of ID nonnegative signal, x G 
was analyzed in [9], see also [10] for a shorter exposition of the same idea. Adapting to our 
setting, the result in [9] can be summarized as follows: put ]]•]] = jj-jji in the definition of NAF and 
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Q = Qflat,iD- Let C be the class of all signals with ||x||o < n/2. Then the NAF'^™ of the convex 
feasibility program 

find X > 0 s.t. ||s —Qx||2<(5 (F) 

with b chosen so that ||z ||2 < (5, is a finite positive constant. The exact dependence of NAF'^'"^ on 
N and n is not specified in [9]. As we will see, further examination of the proof from [9] leads to a 
bound of the form 

NAF"™[C,Q] < (23) 

where C is a numerical constant. First, this does not depend on the Rayleigh regularity of x but on 
the sparsity. Second, this does not depend on the SRF but on the grid size. By comparing to (21) 
and (22) we see that the bound (23) is weak. Indeed, consider the interesting case N,n ^ oo with 
N/n = SRF kept constant. In this case the bounds in (21) and (22) remain finite, whereas the RHS 
of (23) converges to +oo very quickly. The bound in (23) does not depend on the frequency cut-off 
fc or equivalently the number n of pieces of information we are given. Whether the frequency 
cut-off is 10 or 10® the bound remains the same! This cannot capture the right behavior. 

3.2 Innovations 

The novelty of our results can be summarized as follows. 

• As compared to algebraic methods. Theorems 1 and 2 show that efficient algorithms can 
recover the signal in a provably stable fashion. As we discussed earlier, strong worst-case 
stability guarantees are not available for algebraic methods. The flipside is that our results 
crucially rely on non-negativity of the signal; algebraic methods do not need this assumption. 

• As compared to [6], our recovery algorithm is a simple linear program (LP) and, hence, 
is tractable whereas the exhaustive search method of [6] is intractable and cannot be used 
in practice. The difference between stability exponents in (18) and in (21) stem from the 
fact that [6] works with the £2 norm while we work with ii. (The stability bounds for the 
exhaustive search algorithm in [6] do not assume the signal to be nonnegative.) 

• As compared to [7], our results do not rely on the restrictive minimum-separation assumption. 
Having said this, the results in [7] hold for the general case of complex amplitudes, and our 
proofs borrow heavily from the tools developed in that work. 

• As compared to work in [9], our stability estimates are far stronger, for they depend on the 
super-resolution factor, and not on the spacing on the fine grid. Further, if one tries to use 
the proof technique used in [9] to generalize the noiseless results in [9,10] to the 2D case, one 
would need to assume that our image has at most n/2 spikes; this is too restrictive. In sharp 
contrast, we see from Theorems 1 and 2 that if the signal support is Rayleigh regular, we may 
have a number of sources on the order of n^, i.e. on the order of the number of measurements. 

4 Proofs 

4.1 Proof of Theorem 1 in the ID case 

The proof of the theorem is based on the following lemma. 
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Lemma 1. Assume that the assumptions of Theorem 1 are satisfied. Set 


h = [ho - ■ ■ hN-i]^ = X — X 


and 


T={l/N:hi<0}, 


(24) 


and suppose there exists q = [(^o • • • Qn-i] £ and 0 < p < 1 such that Qq = q, ||q||oo < 1; and 


Then 


qi = 0, l/N G T 
qi > 2/3, otherwise. 


/ 2(1-p) „ „ 

X —X h < - • Z h. 


(25) 


(26) 


Proof. Set q = [go • • • gAf-i]"*" = — P and note that ||q||oo < 1 — p since p < 1/2. On the one hand, 

Kq,h)| = |(Qq,h)| = |(q,Qh)| 

< ||q||oo||Qh||i 

< (1 — p)||Qx — s + s — Qxjji 

< (1 - p)(||Qx- sjji + jjs - Qxjji) 

< 2(1 - p)||Qx - sjji 

= 2(l-p)-||z||i. (27) 

On the other hand, using sign(g/) = sign(/i/) for all I gives 


l(q, h)| 


N-l 




/=0 


N-l 


N-l 


qih = \qi\\h\ > /j||h|| 


1=0 


1=0 


(28) 


Combining (27) and (28) yields the conclusion. 


□ 


4.1.1 Localization of trigonometric polynomials 

Lemma 1 shows that in order to obtain a tight bound we need to construct a (dual) vector q 
obeying ||q||cxD < 1 and (25) with p as large as possible. First, observe that since x, x > 0 it follows 
that T in (24) satisfies |T| < || xjjo < n/2. The idea is to construct a real-valued trigonometric 
polynomial of largest frequency fc (recall n = 2/c -|- 1) 

/c 

q{t) = ^ qke~'^'^^^ G M for all t, (29) 

k=-U 


obeying ||g||oo < 1, 

J qi^) = 0, for all t G T, 

|g(t) > 0, for all t ^ T, 

and set q = {g(//iV) : / G [0 : — 1]}. Observe that such a q would obey the conditions of Lemma 1 

with p = i arg mmi/^^-j-{q{l/N)}. 
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A classical approach to constructing such a polynomial q{t) is 

Qit) =11^ [cos(27r(t - to) + tt) + 1 ]. 
toeT^ 


(30) 


This approach, used in [9,10], works whenever |T| < n/2 since the degree of q{t) is then at most 
fc- The problem is that q{t) in (30) grows extremely slowly around its zeros, making p very small, 
which then translates into highly suboptimal stability estimates. To demonstrate this, assume that 
T = {0}, i.e., \T\ = 1. Then (see Figure 3) 

q{t) = ^ [cos(27rt + vr) + 1] (31) 


so that 


9(VA^) < ^ 


2 

and p < 2 ^- Plugging this into (26) we get an estimate no better 


than 


X — X 


2 n 

^ _ AT^ 


(32) 


This is weak. In the case when x has one spike, the separation condition T G 7^i(3.74,1) of [7] is 
trivially satisfied. The results in [7] guarantee that li minimization achieves 

X — X 1 < C • • z 1 = C ■ SRF^ • z b, 

where C* is a numerical constant. The reason why [7] provides stability guarantees far stronger 
than (32) is that the trigonometric polynomial q{t) constructed in [7] grows around its zeros much 
faster than q{t) in (31). We review the behavior of q{t) constructed in [7] in Lemma 2 below and 
illustrate the difference between this polynomial and that in (31). Based on the results of [7], we 
then present a novel construction for q{t) that does not rely on the minimal separation condition 
T G 77i(3.74,1) needed in [7] and works for all signals with Rayleigh regular support of the type 
T G 77 . 1 ( 3 .74r, r). At the same time, the new polynomial q{t) grows rapidly around its zeros, which 
allows us to derive strong stability guarantees. 


4.1.2 Main building block: q{t) under separation 

The following lemma is an immediate consequence of [7, Lm. 2.5] adapted to the case of real-valued 
signals as explained in [7, Sec. 2.5]. 

Lemma 2. Assume T G 77i(3.74,1; iV, n). As before, fc = {n — l)/2,Ac = l//c and suppose 
fc > 128. Then there exists a real-valued polynomial q{t) = Ikiloo < 1 such 

that 

{ (l{t) = 0) for all t £ T, 

\ q{t) > 4>{t), for all t, 


where (see Figure 3) 


and Cl 


cl){t) 


cifcito - ^)^ 
ci/c(c2Ac)^ = cicl, 


0.029, C2 = 0.17. 


for all t s.t. G T with 
otherwise, 


7o| < C2 Ac 


(33) 
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Figure 3: Comparison of trigonometric polynomials at two different scales: the polynomial from [7] 
(see Lemma 2) bounces off zero much faster than that used in [9,10]. 


The significance of this lemma is that the growth of q{t) around its zeros is nearly optimal. 
Indeed, suppose we wish to construct a real nonnegative polynomial with highest frequency fc of 
the form (29) of magnitude at most one, and which grows around its zeroes as fast as possible. How 
fast could it possibly grow? Since q{t) is a superposition of harmonic functions, it cannot outpace 
a pure harmonic—normalized to take on values in [0,1]—at the highest available frequency. Hence, 
we cannot hope for growth faster than 

^ [cos(27r/c(t — to) + vr) + 1] Ri tt^/c (^ “ for small t. (34) 

Comparing (34) to (33), we see that Lemma 2 provides a construction that is optimal up to at 
most a constant factor. 

We now show how to extend the construction in Lemma 2 to the case where the elements of 
T are not necessarily well-separated, but T is Rayleigh regular. Together with Lemma 1, this will 
prove Theorem 1. The proof below is illustrated on Figure 4, which the reader is encouraged to 
consult while following the argument. 



Figure 4: Illustration of the proof of Theorem I for r = 2; 71 = {ti,t 2 ,t 3 }] Ti = {^ 25 ^ 4 }- The 
trigonometric polynomials qi{t), q 2 {t) satisfy qi{t) = 0 for all t G 71 and q 2 {t) = 0 for all t G Ti; 
they are not displayed. The lower bounds <?ii(t) and </> 2 (t) defined in (35) are depicted. 
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4.1.3 Construction of q{t) without separation 

Take x G TZ^ (3.74r, r) with a support of cardinality S. Define h and T = with ti < 

t 2 <■■■< ts as in Lemma 1. Since hi can only take on negative values on supp(x), then T G 
77i(3.74r, r). Consider the partition T = where Ti = {Uk+i Since T G 77i(3.74r, r), 

Ti G 77i(3.74r, 1) and by rescaling, 


77i(3.74r, 1; N, re) = 77i(3.74,1; N, re), 

where h = [n — l)/r + 1. Set^ fc = {h — l)/2 = (re — l)/(2r) = fc/r and Ac = l//c = r/fc- 
By Lemma 2, there are real-valued polynomials qi{t; NT) = t with ||(7i||oo < 1 and 

K — — Jc 

jQiit) = 0) for all t £ T, 

\qi{t) > Tit), for alH, 


where (see Figure 4) 


Tit) = 


Clfc ito - t)"^, 


for all t s.t. 3to G T with \t — to| < C 2 Ac 


Clfc (c 2 Ac)^ = C 1C2, otherwise. 


(35) 


The trigonometric polynomial q is obtained by taking the product of the qfs: 

r 

Qit) = 

i=l 


By construction, qit) is band-limited, i.e., g(t) = J2k=-f^ QkC 


^—i27rkt 


< 1, and 


qito) = 0, for all to G T, 

Qi't) > riLi Mt), for all t. 


(36) 


Next we further lower-bound 01=1 Tit)- Fix t and let A/" = {ti,..., tf} = {t G T : |t — t| < C2Ac}. 
Since T G 77i(3.74r, r), it follows that r < r. Let to be the closest element of AA to t so that 

(to — t)^ < iU — t)^ for all z = 1,..., f. By the definition of M, cifc iU — t)^ < cic|. Using (35) 
and these inequalities we may write 


X{Tit)>icicir-^TjfX{iti-tf 

i=\ i=l 


> 


2r 

Clfc iT — t)^^, for all t s.t. f > 0 
(C1C2)'’, otherwise. 


(37) 


The assumption SRF > 3/r implies SRF = N/n > l/(2rc2) ~ 2.94:/r, which is equivalent to 
l/N < C2r2ln so that 1/N < C2Ac. Therefore, from (36) and (37), it follows that 


1 I ~2r 

p = - arg min{g(//A^)} > c\-fc 

^ ijN^rr ^ 


iV2r 


— J 

— C\ 


1 


■/, 


2r 


1 


2r2r''r ^2r 


— Cl 


2(2r) 


2r 


re — 1 
N 


2r 


(38) 


^ Strictly speaking, this requires fc/r to be an integer. If fc/r is not an integer, we can substitute fc with r[/c/rj 
and repeat the argument for the new fc. Since fc > 128r by assumption, this transformation will result in less than 
a 1% change meaning that 77i(3.74r,r) would need to change into 77i(3.77r,r). To keep things simple, we ignored 
this detail throughout the paper and implicitly assumed that fc/r is an integer. 
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Plugging this into (26) gives 


||x - x||i < 4 c 3^ (n^) ' 

Ci(r) 

where C 3 = 4/ci = 67.79. This completes the proof. □ 

Remark (Possible Improvement). The constant 3.74 in x G 77^(3.74r,r) comes from the fact 
that our construction is built on top of Lemma 2 borrowed from [7, Sec. 2.5]. If the constant 
3.74 in Lemma 2 is reduced, all our results automatically improve without any modification. Car¬ 
los Fernandez-Granda privately shared with us [40] that it is possible to substitute 3.74 by 2.52 
in Lemma 2. 

Remark (Extension). Consider a signal consisting of spike clusters as shown in Figure 5, and 
violating the separation constraint. Suppose that within each cluster, the spikes have the same 
sign. Then our proof technique can be used to show that if the clusters are sufficiently separated, 
the signal can be recovered stably by convex programming. We omit the details. 



Figure 5: Signal with clustered spikes that have the same sign within a cluster but different signs 
across clusters. 


4.2 Proof of Theorem 2 in the ID case 

Our strategy is to reduce the problem to that in which we have a flat spectrum. Choose 1/2 < a < 1 
(a parameter we can optimize) so that afc is an integer, and define the filter 


Tk 


A 


' /c+1 

/c + l-|fc|’ 

^ ifc + 1) {a\k\ + b) , 

. 0 , 


k — Oifc^ • • ■ ) 

\k\ =a/c + l,...,/c, 

otherwise. 


with 


A 

a = -- 


and b = 


/c(l - a) + 1/c(l - a) fc{l - a)+ 11 - a’ 

Set R = diag(f) with f = [r_jv/ 2 -i-i • • • 'f^N/2V R = F'^RF. The point is that 

T = RQ = F^^RFF^^QF = F^^RQF = F^TF 

has a spectrum T = diag([£_jv/ 2 -i-i ‘ ‘ ‘ iN/ 2 V) given by 


ik = 


!■> k — n/c) ■ ■ ■ 1 

(/c + 1 - \k\) {a\k\ + b) , \k\ = a/c + 1, ..., fc, 

0 , otherwise. 

V. ’ 
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7 ^( 9 . 52 , 2 ) 


> 4.76A, 


(a) (b) 

Figure 6: In (a), T G 77.2(9.52,2) has eight elements and can be decomposed as T = 71 U Ti with 
71,72 G 77.2(9.52,1). The points in 71 are in blue and those in Ti are in green. In (b), T has eight 
points belonging to a line parallel to a coordinate axis; this is a worst-case scenario. 


Note that the spectrum of T is flat in the region —afc < k < afc- Next, construct q as in Section 4.1 
with fc replaced by afc- Because q is band-limited to afc, Tq = q. On the one hand, 

|(q,h)| = |(Tq,h)| = |(q,Th)| 

< ||q||oo||Th||i 

< (1 — /3)||RQx — Rs -b Rs — RQx||i 

< (1 - p)||R||i,op (IIQx - s||i -b ||s - Qx||i) 

< 2(1 - p)||R||i,op||Qx - s||i 

< 2(1 - p)C{a) ■ ||z||i. 

The last step follows from Appendix C, where we show that for all N and all SRF, 

||R||i,op < Cia) A 2a + ^ + (40) 

Note that C{a) is finite as long as a < 1. For a = 1/2, C(a) = 7.22. For a = 0.75, C{a) = 18.38. 
On the other hand, |(q, h)| > /9||h||i as before, where p is given in (38) with the substitution 
n — 1 = 2/c 2afc- In conclusion, 

C = Ci{r,a) = Ci{r)C{a)(^^^ . (41) 

4.3 Remarks on Theorems 1 and 2 in 2D 

The proof of the 2D version of Theorem 1 closely mimics that in the ID case. Since x G 77^(4.76r, r), 
we can work with a partition T = Ui<j<r.71 with 71 G 772(4.76r, 1). This is illustrated in Figure 6 
for r = 2. The proof follows the same steps as in Section 4.1. The dual trigonometric polynomial 
is constructed as a product of r polynomials. The i-th term in the product has zeros on 71 and 
is constructed using Lemma 4 given in Appendix D for completeness; this lemma is a 2D version 
of Lemma 2, and its proof can be found in [7, Prop. C.l], [8, Sec. D.l]. The proof of Theorem 2 in 
the 2D case follows the steps outlined in Section 4.2 with slight modifications, which are omitted. 

To the best of our knowledge, Theorem 1 is the first result, in the noisy and in the noiseless 
setting, showing that the 2D super-resolution problem can be solved via convex optimization when 
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the signal is nonnegative, without assuming a separation condition. It is instructive to discuss this 
point in details as the discussion reveals interesting insights about the super-resolution problem in 
higher dimensions. 

Suppose one would like to obtain a noiseless result for nonnegative signals similar to that 
in [9,10]. Following Section 4.1.1 one could take a 2D version of the polynomial in (31), 

- [cos(27rti -|- tt) -|- 1] [cos(27rt2 -|- vr) -|- 1] (42) 

and then form a product of such terms to build a low-frequency polynomial q{ti,t 2 ) as done in 
(30). What is the largest number of terms the product in (30) could contain in the 2D setting? 
Since each term of the form (42) costs two units in the frequency domain in each variable, to be 
able to guarantee that q{ti,t 2 ) has frequency no larger than fc in both variables, one can have no 
more that n/2 terms of the form (42). Because each term in the product is zero only at one point 
of the support, this technique would not guarantee recovery of signals with more than n/2 spikes. 
This is discouraging since we have have observations. It is easy to see that ||x||o < n/2 is a 
tight bound in the worst case: think about the situation where the support is located along a line 
parallel to a coordinate axis as in Figure 6(b). In this case, even though we have observations, 
the problem is essentially one-dimensional with n observations and no more that n/2 spikes can 
possibly be resolved. 

However, what happens in the typical situation where the spikes are Rayleigh regularly spread 
over the domain as in Figure 6(a)? In this case, we construct the trigonometric polynomial, which is 
a product of r terms as in the 2D version of (37). Each term in the product has frequencies at most 
n/r and vanishes at ~ r? jr^ points of the support simultaneously. For example, in Figure 6(a) all 
the elements in 71 are roots of the hrst term and all those in Ti are roots of the second. Hence, as 
Theorem 1 shows, the number of spikes can be as large as 

(n — 1)^ r? 

4.762r2 ^ ~ 4.762r’ 

i.e. for a fixed value of r, ||x||o may grow linearly with the number of observations. This is a much 
stronger result compared to what would be achievable via the method from [9,10]. 

5 Numerical results 

This section introduces a numerical simulation to illustrate the effectiveness of our method in super¬ 
resolution microscopy. Set /c = 19, SRF = 10, so that N = 390 and consider the 2D model with 
Q — Qtri,2D' 

• The image x of dimensions 390 x 390 dimensional is shown in Figure 7b. This image contains 
five different regions with different source densities: (i) the top-left quarter is a signal from 
77 . 2 ( 4 . 28 , 1 ); (ii) the top-right quarter is from 77.2(2.14,1); (iii) the lower-left quarter is from 
77 . 2 ( 4 . 28 , 2 ); (iv) the lower-right quarter towards the center is from 772(2.24,2); (v) and the 
lower-right quarter towards the corner contains three closely co-located spikes. All spikes 
were chosen to have equal magnitude set to 10,000. To be clear, we are performing one 
large experiment in which different regions of x exhibit different spike densities; we run the 
reconstruction algorithm only once. (Overall, the signal would need to belong to 772(-,3) 
since it contains three spikes in a Nyquist cell.) 

• The observations, displayed in Figure 7a, are generated according to the model s = Pois (Qx). 


21 




We solve the LP (CVX) by smoothing the objective into /i^(s — Qx), where is the Huber 
function defined as /i^(y) = Xli where 




|t| — fj,/2, |t| > /i. 


This is a smooth approximation to the ii norm, and is tight when ^ is small [41]. To make sure 
our approximation is really tight, we set = 0.1 • ||-v/s||i/X^ « 0.1 • ||s — Qx||i/X^. We then solve 
the smooth problem using Lan/Lu/Monteiro’s primal-dual first order method [42] with a solver 
written in the framework provided by TFOCS [43]. There are two implementation details worth 
mentioning. First, we start the algorithm from an initial guess obtained by the frequently used 
continuation method. That is, we solve a series of three smoother problems (so that convergence 
is faster) with /x G {IO^/xq, IO^/xq, IO/xq}, each time taking the solution to the previous problem as 
an initial guess. To solve these intermediate problems, the stopping criterion is a relative (.2 error 
between two consecutive iterations below 10“^ or a number of iterations reaching 1000, whichever 
occurs first. Second, for the value of ^ we perform 15,000 iterations of the Lan/Lu/Monteiro’s 
method to obtain a precise solution. This is an overkill but at the same time, this guarantees that 
we are solving (CVX). For information, the total computational cost is about 40,000 2D fast Fourier 
transforms (FFTs) of size 390 x 390. 

The signal estimate is displayed in Figure 7c. In Figure 7d we zoomed-in to six interesting 
domains of the images in Figure 7a-Figure 7c, which are marked by white boxes in Figure 7a. In 
each series of three images in Figure 7d we present the data, the original signal, and the estimate 
produced by (CVX). 

As we can see, in the regions (i), (ii), (iii) the algorithm performs very well, resolving even the 
closely located pairs of spikes in region (iii) (please see the zoomed-in vignettes). In region (iv) 
the algorithm fails in many places, and region (v) is very poorly resolved. The reason for the poor 
resolution in regions (iv) and (v) is that in (iv), the average density of spikes is too high. In region 
(v) there are too many spikes located within one Nyquist cell. 


6 Conclusion 

When a signal is positive and Rayleigh regular, then linear programming solves the super-resolution 
problem with near-optimal worst-case performance. Although the results presented in this paper 
assume that the signal is supported on a discrete grid, extensions to the continuum can be found 
in the companion paper [5]. 

A widely open research problem concerns the super-resolution of complex-valued signals. In 
ID, [6] shows that if the signal belongs to 77i(4r, r), then stable super-resolution is possible via 
exhaustive search. If the signal belongs to 77i(4,1), [7] proves that stable super-resolution can be 
achieved via £i-minimization. Is there a computationally feasible algorithm that achieves stable 
super-resolution for signals in 77i(4r, r) with r > 1? If no such algorithm is found, is it possible to 
show that this problem is in some sense fundamentally difficult from a computational viewpoint? 
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Figure 7: (a) Observed data Pois (Qtri, 2 Dx); (b) true signal x; (c) estimate produced by (CVX); 
(d) six zoomed-in vignettes corresponding to the boxes in (a); the rows-of-three in (d) show the 
observed data, the true signal and the estimate in this order. 




A Proof of Theorem 3 


The proof uses the idea of [9, Th. 4] with an important difference: there, the authors provide a 
lower bound on a modulus of continuity defined as sup^^ X 2 ec ll^i “ X 2 ||i/||Qflat,iD(xi — X 2 )|| 2 . In 
our case, we are interested in a lower bound on MC[C, Q] = sup^j X 2 ec ll^i “ X 2 ||i/||Q(xi — X 2 )||i. 

Put h = [ho - ■ ■ /iTv-i]"*" = X — X and s = [sq • • • ■sat-i]"'' = Qh. Then a standard calculation 
shows that Sm can be written as 


S-m 


N-1 

E 

1=0 


Slow 


m — I 
N 


hi, 


(43) 


where 

1 / sin((l + /c)7rt) y 

{1 + fc)N V sin(7rt) ) 

is the Fejer kernel. The idea is to construct h with at most 2r nonzero elements in such a way that 
for each m, the terms in the sum (43) cancel each other out as much as possible. One way to do 
this in a systematic way is to set 



Xk 


^(V)> A:G{0,2,...,2(r-l)} 

0, otherwise. 


— ^k—1 


(with the periodic convention). Obviously, ||x||o = ||x||o = r and setting w = 2r —1 for convenience. 


Ihlh = llx — xlh = — 


1 

1 


1=0 


u 


= 1 . 


With this, 


Srrt. — 


E(-i)' 


1 f U! 


1=0 , 


2‘^ V I 


/ m — uj UJ — I 
Slow I -TT- 1“ 


N 


N 




m — UJ 
N 


where 


A 5 [siow](t) = ^(-1)' + (cu - m 


is the finite-difference operator of order uj applied to the kernel 5iow(')- large N and large SRF, 
A'l/ArbiowKi) ~ a crucial fact allowing us to obtain closed-form estimates on ||s||i. 

Formally, write Sm as a Fourier series 


Sm, — 


I J c 

_ \ ^ i2vmk/N 


N 


where 


and 


k=-fc 


Q{f) = 


k 


/c + 1 


Puj 


1-|/|, /G[-l,l], 

0, otherwise. 


pM) = '^e 

1=0 
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Now let 


N 


f(fc+l)/N / 

tm= 

J-{fc+l)/N V Jc + i- 

It is not difficult to see that for all N 

N/2 

^ ^ \tm 'SmI ^ ^ ^ \im 

m=—N/2-\-l |m|>A^/2 

and, therefore, since the series Ylm=-oo converges, 

N-l 

||s||l = |Sm| = 


pM)df- 


(44) 


m=0 


N/2 
m=—N/2+l 


E I'- 


m=—oo 


when N,n —)• oo with N/n = SRF fixed. Using the fact that q{f) = 0 for |/| > 1/2 and changing 
variables in the integral in (44) we can write 

K/c+l)/Af 


tm — 




'-{fc+l)/N 


/c + 1 




We conclude that as N, n —)■ oo with A^/n = SRF fixed, 

1 1 


s 1 


where. 


1 A 1 1 


X(r, 7?) 2“ 27/ f 


E 


SRF J xir, SRF) 


gi2.W(2.))/^(/) (277)2--Wi(//(277))4f 


-1 


m=—oo 

Since the finite difference operator converges to the derivative operator as <5 —0: 

—A^[51ow] (•) ^-^, <5^0, 


it follows that for every fixed /, 


Therefore, when SRF —)> oo, 
where 


rfpM/v) ^(i27r/)‘^, 7/ ^ oo. 


x(r,SRF)^CL(r) 


( /•OO /•! 

^2.-1 / / ei2-U^(/) 

J-oo J-1 

Direct numerical computation reveals 


dt 


CUr) > 


1.66, r = 1 
1.44, r = 2 
0.92, r = 3 
0.48, r = 4 
_0.24, r = 5. 


(45) 
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B Coherent Optics 


When the illumination is perfectly coherent, the time-varying phasor amplitudes across the object 
plane differ only by complex constants so that we can write 




Plugging this into (5) we obtain 


Scoh(v) oc 


/i(v — w)<h(w)(iw 


(46) 


We see that in a coherent imaging system, the directly observable received intensity, Scoh(v), is a 
nonlinear (quadratic) function (46) of the signal <h(w). 


C Proof of 40 


By definition, R = [ro---rAr_i] is a circulant matrix, and, hence, ||R||i^op 
properties of circulant matrices, 

f = -\/]VPro 


or, equivalently, 


so that 


We use the following lemma. 




|R| 


l,op 


y/N 


iF^fll 


|ro||i- 


Lemma 3. Assume ui is a discrete periodic signal with period N. For each I, let 


Further, by 


vi = ui-ui-i, (47) 

wi = vi- vi-i (48) 


be the first and second differences of ui. Let Uk,Vk,Wk be N-periodic sequences of inverse DFT 
coefficients of ui,vi,wi, respectively. For example, 


Uk 


1 

Vn 


N/2 

i=-Ar/2+l 


Assume that 

N/2 

Yj 

k=-N/2+l 

Then for all k 0 mod N, 


1 A 

^ ~ y/N 2 — 2cos{2TTk/N) 


for all k 0 mod N. 
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Proof. By the theorem about the DFT of first differences [44, p. 223], 


Wk 


= 1^1 _ ^2nik/Ny 


Uk 


and, consequently. 


Uk = 


_ g27rifc/Af^ 


jWk for all k ^ 0, ±N, ±2N ,.... 


Next, observe that 


2^ \wi\ < 


y/N 


l=-N/2+l 


^/N' 


( 49 ) 


(50) 


Substitnting (50) into (49) and using that |1 — exp(27riA:/A^)p = 2 — 2cos(27rA:/A^) conclndes the 
proof. □ 

Set ui = fi, continued periodically with period N, and define vi, wi as in (47) and (48). Observe 
the following facts: 


hfc — 0, /c G [—N/2 + 1 : — /3/c], 

% = (/c + l)|a|, /c G [-/c + 1: - a/c], 

_ /c+l _ 

V-af,+l - ((i_c^)/^+i)((i_«)/^+2) - 

% =-(/c + l)|a|, A: G [a/c + 1:/c], 

hfc = 0, /c G [/c + 1 : N^/2] 

and note that Vk is monotonically increasing on the intervals [— iV/2 + 1 : — afc], [— a/c + 1: «/c] 
and [afc + 1 : N/2]. From this it immediately follows that 


-afc -afc 

\wk\= Y \'^k\ = V-afc - v-f, = {1 + fc)\a\, 

k=-N/2+l k=-fc+l 

l^-afc+ll = JT, - w , r,\ + (/c + l)l«l = \Wafc+l\^ 
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. 10 ((1 “ «)/c + 1)((1 - «)/c + 2 ) ’ 

k=-afc+2. 

N/2 
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k=afc+2 


SO that 


where 
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Furthermore, a direct calculation reveals that 
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Finally, (40) follows from 
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^/N 
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where we used that 
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r-A^/2 
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2 — 2cos{k2TT/N) JN/{fc+i) 2 — 2 cos(/27r/Ai) 
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and 


for all (A+1)>0^ 

[fc + 1) TT 


D Basic Lemma in the 2D case 

Lemma 4. Fix N,n and assume T G 7^2(4.76, l)A^n. Set fc = {n — l)/2,Ac = l//c and suppose 
fc > 512. Then there exists a real-valued trigonometric polynomial 

fc fc 

q{t-,N,n)= Y. Y t = [ti,t 2 ]^, 

fcl=-/c k2=-fc 
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such that ||g||oo < 1? and 


q{t) = 0 , for all t G T 
<?(t) > 0 (t), for all t, 


where 




ci/c l|to ~ till, for all t s.t. 3to G T with ||t — to||oo < C 2 Xc 
C 3 for all t G {t : ||t — to||oo > 02 X 0 for a// to G T} . 


Above, Cl, C 2 , and C 3 are numerical constants. 
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